ML to help solve FOL model-checking problems tabula rasa

Main Finding so far:

Neural MCTS solves specific combinatorial problem instances through gamification (We tested HSR [The Joy of Egg-Dropping Sniedovich (2003)] and QSAT [Kleinberg/Tardos]).

Introduction to Zermelo Gamification

Authors: Karl Lieberherr and Ruiyang Xu

Our research is inspired by AlphaZero's superhuman capabilities to play Chess and Go using a combination of MCTS (Monte Carlo Tree Search) and neural networks. We believe that the neural MCTS algorithm used in AlphaZero can be adapted to search effectively in other search spaces. We currently focus on using neural MCTS to searching for solutions to first-order model-checking problems (which include many combinatorial problems) and hope to get a surprising search performance similar to AlphaZero for Chess. Our goal is to search for a solution tabula rasa, i.e., only using the mathematical problem definition. While any first-order model-checking problem can be translated into a game (hint: use the semantic game defined by the predicate logic sentence of the model-checking problem), the resulting games usually have action spaces that are too large for a direct application of neural MCTS.

The neural MCTS algorithm builds "self-teaching" neural nets that learn from asymmetric self-game-play. Go and Chess are symmetric games where a solution to play White optimally can be used to derive a solution to play Black optimally. However, the games which come from model-checking problems are not necessarily symmetric. The two players might have significantly different tasks. This affects self-game-play because the two players have tasks of different complexities.

We have succeeded in properly translating certain combinatorial problems to inputs of the neural MCTS algorithm so that the algorithm converges to find the winning strategy for both players and correctly solves the combinatorial problem for a specific instance. We prove this by evaluating an AlphaZero style algorithm in a ground truth setting.

Although currently each problem instance is solved separately, we hope that we can get a more general solution that works on a family of related instances.

We call our framework starting from a combinatorial problem, creating a game, solving the game with neural MCTS and mapping the winning strategy back to a solution of the combinatorial problem, the Zermelo Gamification (ZG). It is a new framework for solving combinatorial problems leveraging the outstanding search capabilities of neural MCTS as demonstrated by Google’s AlphaZero.

Gamification is a well studied topic when the players are human neural networks. We attempt a new kind of gamification where the players are artificial neural networks. For human neural networks the focus is on motivating humans to contribute by making the game motivating. Gamification for human neural networks is used to train or educate humans in a specific domain (cite Magy Seif el-Nasr) or to use them as problem solvers (cite Seth Cooper, Fold-it). For artificial neural networks the focus is on creating a game and choosing neural networks for learning and playing the game that are the perfect fit for solving the problem at hand.

Our objective is not necessarily to improve the state-of-the-art of hand-crafted problem solvers in specific areas but to illustrate that there is a generic algorithm (neural MCTS) that can solve well-known problems tabula-rasa. The hope is that neural MCTS will help solve future algorithmic problems that have not yet been solved by humans. We view Neural MCTS as a helper in human solving of algorithmic problems in the future.

Note 1: Gamification is traditionally used to encourage participation by humans. We use gamification to make problems solvable by machines. This puts new constraints on how the gamification is done using the basic quantifier-based gamification from semantic games [Hintikka]. Zermelo Gamification is about how to gamify and how to design the neural MCTS architecture so that neural MCTS can succeed in solving an instance of the FOL model-checking problem.Note 2: We name the gamification after the logician Ernst Zermelo who is well known for Zermelo's Theorem for Chess and Zermelo-Fraenkel set theory.

Justification of the Zermelo Gamification (explanation capability of neural networks)

Why don’t we try to solve the combinatorial problem directly rather than by translating it into a game using a Zermelo Gamification? ML is often accused of producing black box solutions that don’t explain themselves. We design the neural network with enough hooks so that we can better observe what the neural network does. The hooks are created through the game transformation that trades a complex question (e.g., is there a solution) for a series of simpler questions (e.g., am I losing when I make this choice?). There is a trend in ML to use neural networks with hooks to make them more understandable. See Hanspeter Pfister's work at Harvard.

With the constructive nature of the games we get through the semantic games approach we can explore why questions. For example, Why does this choice not lead to a solution? You can play the game, and you see an example gameplay that indeed does not lead to a solution. The advantage of gamification is that the gameplay demonstrates that there is a solution/ no solution to a specific instance. This constructive demonstration is useful to humans who try to understand the problem. The demonstration is much better than just being told: no, there is no solution. Of course, here we assume that neural MCTS finds the optimal strategy.

This is a key benefit of ZG: the explanatory power of the neural networks is helping to better understand the problem. This has, for example, educational benefits. Our neural networks provide constructive answers both for a positive problem as well as a negative problem (which does not have a solution). For a negative problem, the neural network will provide a demonstration to anyone who tries to construct a solution that the solution attempt will result in a contradiction. This is much more useful than simply the answer no. The improved explanatory power does not provide proof but only a demonstration that the problem is unsolvable. This demonstration is given during an attempt to construct a solution and requires a tightrope walk of the neural networks to produce a contradiction. There is danger that no contradiction is produced and this would make it appear that there is a solution. In our case, the neural networks don't just say the equivalent of this is a cat but they give a step-by-step explanation of why it is a cat.

With the Zermelo Gamification, we generalize the initial construction problem to an initial/complement construction problem. The generalization to the initial/complement construction problem is more useful for problems whose games have many alternating moves. We observe an application of Polya’s inventor’s paradox which says that sometimes a more general question is more comfortable to solve. In our case, the only way that machine learning can succeed is if we attack the initial problem and the complement simultaneously. They need each other to find the solution and converge to the optimal policy. The line graphs of HSR games demonstrate that the Proponent and Opponent help each other to learn to get better over time but with occasional setbacks.

Simulatability (Murdoch et al. 2018). This section is work in progress.

Model-based interpretability
"A model is said to be simulatable if a human is able to internally simulate and reason about its entire decision-making process (i.e., how a trained model produces an output for an arbitrary input."
"we focus on the use of interpretations in the context of ML as part of the larger data science life cycle. We define interpretable machine learning as the use of machine-learning models for the extraction of relevant knowledge about domain relationships contained in data. Here, we view knowledge as being relevant if it provides insight for a particular audience into a chosen domain problem. These insights are often used to guide communication, actions, and discovery". We extract relevant knowledge from the self-game-play data.
"The ability of the interpretations to properly describe what the model has learned is denoted by descriptive accuracy. " This is related to our quest of determining when the learning has converged to the winning strategy for both roles of the player.
"In the context of ML, there are two areas where errors can arise: when approximating the underlying data relationships with a model (predictive accuracy) and when approximating what the model has learned using an interpretation method (descriptive accuracy). For an interpretation to be trustworthy, one should try to maximize both of the accuracies. " For us, descriptive accuracy is reached when the winning strategy is learned. We approximate the underlying data relationships with a learned policy which also predicts the game outcome (predictive accuracy)?

Requirements for the Zermelo Gamification

The Zermelo Gamification is more complicated than it looks. It requires a sophisticated compiler to map the combinatorial problem formulation into a game that neural-MCTS solves successfully. Here are the concerns that apply. For now, we humans have to balance those concerns.

Which neural network architecture should be used to encode the combinatorial problem to guide MCTS? If a set of numbers gives the game board, we need a different encoding than when a graph gives the game board.
How do we control exploration versus exploitation during MCTS? Different games require different controls as described by the AlphaZero paper.
How do we keep the action space of manageable size? This is a common problem for RL applications. There are various game transformations which reduce the action space size yet still lead to a solution to the problem.
Chess is a symmetric game where a winning strategy for White can be easily transformed into a winning strategy for Black (White-to-Black optimal policy transfer). The White-to-Black optimal policy transfer does not apply for games coming from combinatorial optimization problems because the two players might have very different move options (action spaces). It might still be possible to symmetrize such games, but that might require insight into the game which goes way beyond its definition. For games which are not symmetric, one player might have the easier task, and this can influence the speed with which the two players bootstrap each other from random play.
There is a need for a stopping rule to decide that most likely the learning has converged. Without knowing the ground truth, this is a non-trivial task. There is a danger that we stop too soon, before the winning strategy has been found.

One limitation of our current neural MCTS algorithm is that it has competitors like the Alpha-Beta Pruning algorithm. However, we believe that our standard hardware (1 desktop with 4 cores) is the reason that Neural MCTS does not beat Alpha-Beta Pruning as it does for Chess and Go.

Evaluating Zermelo Gamification

We want to measure whether ZG is successful and converges to a winning strategy. For Chess and Go we know that AlphaZero finds a good strategy but we don't know whether it is a winning one. Having a winning strategy means to only make correct moves. A move is correct for player P if it starts in a winning position for P and ends in a winning position for the same player. A move is also correct for player P if it starts in a lost position for P. According to this definition, when you are lost you cannot make a mistake unless your opponent makes a move which is not correct. Then you are in a winning position and you must make all correct moves if you want to stay mistake-free.

Making only correct moves in a winning position is correlated with a high level of skill.

Our publications

Preprints
- Ruiyang Xu, Prashank Kadam, Karl Lieberherr: First-Order Problem Solving through Neural MCTS based Reinforcement Learning, https://arxiv.org/abs/2101.04167
Workshops (non-archival but reviewed)
- AAAI 2019 workshop on "Reinforcement Learning in Games" organized by three DeepMind researchers: Marc Lanctot, Julien Perolat, Martin Schmid. Ruiyang Xu and Karl Lieberherr, Learning Self-Game-Play Agents for Combinatorial Optimization Problems.
- AAMAS 2019 workshop on "Adaptive and Learning Agents" organized by Patrick MacAlpine (Microsoft Research, USA), Patrick Mannion (Galway-Mayo Institute of Technology, IE), Bei Peng (Microsoft Research, USA), Roxana Rădulescu (Vrije Universiteit Brussel, BE) . Ruiyang Xu and Karl Lieberherr, Learning Self-Game-Play Agents for Combinatorial Optimization Problems.
Conferences
- Ruiyang Xu , Karl Lieberherr, Learning Self-Game-Play Agents for Combinatorial Optimization Problems, AAMAS 2019, Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, Pages 2276-2278, (full paper, 8 pages)
- Prashank Kadam, Ruiyang Xu, Karl Lieberherr, Accelerating Neural MCTS Algorithms using Neural Sub-Net Structures, Proceedings of the Adaptive Agents and Multi-Agent Systems Conference (AAMAS 2023), Pages 237-2639.
Journals
- Ruiyang Xu , Karl Lieberherr, Learning Self-Play Agents for Combinatorial Problems and Games, Knowledge Engineering Review, Cambridge University Press, accepted for publication 2019. Pre-publication paper.
  - Publisher's citation: Xu, R., & Lieberherr, K. (2020). Learning self-play agents for combinatorial optimization problems. The Knowledge Engineering Review, 35, E11. doi:10.1017/S026988892000020X
  - Citation: @article{xu_lieberherr_2020, title={Learning self-play agents for combinatorial optimization problems}, volume={35}, DOI={10.1017/S026988892000020X}, journal={The Knowledge Engineering Review}, publisher={Cambridge University Press}, author={Xu, Ruiyang and Lieberherr, Karl}, year={2020}, pages={e11}}

What we build on

Gerald Tesauro: Programming backgammon using self-teaching neural nets http://www.bkgm.com/articles/tesauro/ProgrammingBackgammon.pdf

AlphaGo

AlphaZero

Expert Iteration: Deep Pepper: Expert Iteration based Chess agent in the Reinforcement Learning Setting https://arxiv.org/pdf/1806.00683.pdf

More Related Work

Elias Khalil: https://papers.nips.cc/paper/7214-learning-combinatorial-optimization-algorithms-over-graphs Uses reinforcement learning and graph embedding.

http://www.cs.utexas.edu/~hunt/class/2015-fall/cs395t/slides/QBF.pdf (QSAT info)

How is overfitting and underfitting an issue with neuralMCTS?

https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/

From AlphaGo to Power System AI by Fangxing Li and Yan Du, IEEE Power and Energy Magazine

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8295075&tag=1

Analysis of HSR:

Moshe Sniedovich, (2003) OR/MS Games: 4. The Joy of Egg-Dropping in Braunschweig and Hong Kong. INFORMS Transactions

on Education 4(1):48-64. http://dx.doi.org/10.1287/ited.4.1.48

Google Sites

Report abuse